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Abstract 

£N| ' Clusters of galaxies are the most massive objects in the Universe and mapping their location 

is an important astronomical problem. This paper describes an algorithm (based on statistical 
signal processing methods), a software architecture (based on a hybrid layered approach) and 
a parallelization scheme (based on a client/server model) for finding clusters of galaxies in 
large astronomical databases. The Adaptive Matched Filter (AMF) algorithm presented here 
£f) , identifies clusters by finding the peaks in a cluster likelihood map generated by convolving a 

galaxy survey with a filter based on a cluster model and a background model. The method 
has proved successful in identifying clusters in real and simulated data. The implementation is 
' flexible and readily executed in parallel on a network of workstations. 

C^; 1 Introduction 
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Clusters of galaxies are the largest objects known to humans (see Figure ||). They are the "moun- 
tains" of the Cosmos, and like terrestial mountains they lie in great ranges that define the cosmic 
"continents" and "oceans" (see Figure |2|). Mapping clusters of galaxies is very much akin to sur- 
veying our own world and allows us to understand the creation, evolution and eventual fate of our 
^ | Universe [Bahcall 198c], 



5— i 

The process by which astronomers detect clusters of galaxies begins with assembling large 
images of the sky, which are the result of hundreds of nights of observing through a telescope. 
These pictures are analyzed to produce a database of galaxies X = {xj : i = 1, . . . , Nx}, where 
Nx ~ 10 8 . Each record Xj £ X consists of a position on the sky, brightness measurements in 
one or more bands and possibly hundreds of additional measurements describing the shape and 
composition of the galaxy. 

Clusters are local density peaks in the three dimensional distribution of galaxies across the 
Universe. In 3D data, clusters are easy to detect. Unfortunately, the majority of distances to 
individual galaxies are not known and can only be inferred statistically from empirical models of 
their brightness. Thus, it is difficult to differentiate small nearby clusters from large far away 
clusters. The goal of cluster detection and estimation is to create a catalog, 0, consisting of 
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thousands of clusters G = {6i : i = 1, ...,Nq}, where Nq ~ 10 4 . Each cluster in this list, 
9i, consists of a position on the sky, a distance estimate, a size estimate and perhaps additional 
estimated properties of the cluster. 

The first catalog of galaxy clusters was compiled by Abell | Abell (1958) 1, an d has proved ex- 
tremely useful to astronomers over the past four decades. Abell's catalog was created by visually 
inspecting hundreds of photographic plates taken from the first Palomar Observatory Sky Survey 
(POSS). Modern galaxy databases are too large for such methods to be used today. Subsequent 
efforts to detect clusters have relied on Matched Filter techniques taken from statistical signal pro- 
cessing (e.g., [ Lumsden et al 1992 ], [ Dalton et al 1994 ], Postman et al 19*9^1 , Kawasaki et al 1997 ] 
and [ [Bramel et al 200C ] . These methods have a strong mathematical foundation, but require exten- 
sive prior information and are often computationally prohibitive as they test every possible location 
in the domain of the cluster space £Iq for the presence of a cluster. The Adaptive Matched Filter 
| Kepner et al 1999 ] that is described later in this paper is a variation on the Matched Filter that 
uses a hierarchical set of filters, as well as software coding and parallel computing techniques that 
address some of the Matched Filter's drawbacks. The AMF is adaptive in two ways. First, the 
AMF uses a two step approach that first applies a coarse filter to find the clusters and then a 
fine filter to provide more precise estimates of the distance and size of each cluster. Second, the 
AMF uses the location of the data points as a "naturally" adaptive grid to ensure sufficient spatial 
resolution. 

A variety of other techniques have also been applied to the cluster finding problem. The com- 
pact nature of clusters make Wavelet based signal processing approaches an appealing alternative 
| Fan fc Pando 1997 1 an d [ Fadda et al 1997 ]. Geometric approaches such as Voronoi tessellation 
[Ramella 1995] have also been used. In this method each Xj is the seed for the tessellation. Clus- 
ters are then found by computing the volume of each tessel and selecting the points with the 
smallest volume, which presumably have the highest density. Such geometric methods have the 
advantage that they require very little prior information. 

The Matched Filter, Adaptive Matched Filter, Wavelet and Voronoi Tessellation approaches all 
use the three to five high affinity dimensions of X (i.e., angular position and brightness measure- 
ments). These dimensions are continuous real variables that lend themselves to Euclidean distance 
metrics. Working in these lower dimensions allows more compute intensive techniques which are 
necessary to de-project clusters from the observed data domain Qx to the desired underlying do- 
main in which clusters exist Oe, i.e., angular position, distance and size. More recently, there has 
been interest in exploiting the low affinity dimensions that are also available in galaxy databases 
[Djorgovski et al 1997, pal et al 1999[| to enhance detection. As the understanding of these meth- 
ods increases, the exploitation of many dimensions should be possible using advanced datamining 
techniques (see e.g. [ Fayyad et al 1996 1 and [ Dasarathy 1999 1 and references therein). These meth- 
ods have enormous potential for detecting new clusters and possibly separating them into distinct 
groups thus revealing new classes of galaxy clusters. 

The rest of this paper presents in greater detail the AMF algorithm, its implementation and 
results. In section two a detailed derivation of the AMF is given. The derivation is meant to be 
sufficiently general that it can lend itself to other types of databases. Section three presents the 
implementation of the AMF using a layered software architecture and a client /server parallelization 
model. Again, these methods are not limited to the specific problem presented here and are 
applicable to a variety areas. In section four the results of applying the AMF on simulated and 
real data are discussed. Finally, section five gives the summary and conclusions. 
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2 Adaptive Matched Filter Algorithm 



Matched filter techniques are widely used in statistical signal processing. The idea is to convolve 
the data with a model of the desired signal. In many instances this can be shown to be optimal 
detection method in the least squares sense. Applying matched filter techniques to point set data is 
less common but has become the standard method for detecting cluster of galaxies. The advantage 
of matched filter techniques is that they are mathematically rigorous, provide well defined selection 
criteria and produce few false detections. 



The Adaptive Matched Filter | Kepner et al 1999 ] enhances the matched filter method by cre- 



ating a pair of filters (each correct under own its assumptions) which can be used to trade off 
computational complexity versus sensitivity. The filters are derived by computing the likelihood a 
cluster exists at a particular point 9 € f^o given the data X. Various likelihood functions can be 
derived; the differences are due to the additional assumptions that are made about the distribution 
of the data. This section gives the mathematical derivation of the two likelihood functions used in 
the AMF: C coa , Tse and £fi ne . Both derivations are conceptually based on virtually binning the data, 
but make different assumptions about the distribution of points in the virtual bins. 

Imagine dividing up the data domain fix into bins. We assign to each bin a unique index j. 
The expected number of data points in bin j given that their is a cluster at 9 is denoted ^modelW- 
The number of data points actually found in bin j is n j t . In general, the probability of finding 
n data points in cell j is given by a Poisson distribution 

Pj{0) = 1 model1 ,} j (1) 

The likelihood of the data given the model is computed from the sum of the logs of the individual 
probabilities 

£ = 5>P#). (2) 

j 

2.1 Coarse Grained C 

If the virtual bins are made big enough that there are many galaxies in each bin, then the probability 
distribution can be approximated by a Gaussian 

Pj (Q) = - 1= L == exp j - ( " data ~ n m°del) 2 | _ (3) 

Furthermore, let the model distribution consist of a background field (that is independent of 9) and 
a cluster component (that depends on 9) 

«LdelW=4 +n ^) ■ W 
If the field contribution is approximately uniform and large enough to dominate the noise then 



p.(fl) = _^exp J - (n data "model) i _ (5) 

'2im J } { 2n} • 
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Summing the logs of these probabilities results in the following expression for the coarse likelihood 

^coarse (^) ^ ] ^ Pj 

j 

= -It, ( "' data ~ f modei)2 ( 6 ) 

The first term is independent 9 and can be dropped. In addition, if the bins can also be made 
sufficiently small, then the sum over all the bins can be replaced by an integral 

r , m f Odata(x) -ra mode i(x;(9)) 2 

-i-coarsely) = ~ / 7 — T «X , J 

7n x n/(x) 

where n J model (9) = n mo dei(xj; 0)cbc and n da ta(x) is a sum of Dirac delta functions corresponding to 
the locations of the points Xj. Expanding the squared term and replacing n mode i with nj + n c {9) 
yields 

/" n 2 dat& - 2n data n / - 2n data n c + n) + 2n/n c + n 2 

£co a rse{0) = ~ / " OX . (8) 



The above expression can be simplified by setting 5 = n c /rif, dropping all expressions that are 
independent of 6, and noting that / n c (x; 9)dx is small compared to the other terms, which leaves 

N x 

£coarsc(#) = 2 V 5( Xf , 9) - [ S(x; #)n c (x; 9)dx . (9) 
2.2 Fine Grained C 

If the virtual bins are chosen to be sufficiently small that no bin contains more than one galaxy, 
then the calculation of C can be significantly simplified because there are only two probabilities 
that need to be computed. The probability of the empty bins 

-Pempty = e""-odel (10) 

and the probability of the filled bins 

Pfilled = < odel e-<od el . (11) 
The sum of the log of the probabilities is then 

£fmc = T 111 Pempty + T ln Pmled 
empty filled 

= " J2 "model " E "model + E ln "model ( 12 ) 
empty filled filled 

By definition summing over all the empty bins and all the filled bins is the same as summing 
over all the bins. Thus, the first two terms in equation ( |I2| ) are just the total number of points 
predicted by the model 

£ 4- = / 

all bins Jnx 

= N model (9) =N f + N c (9) . (13) 
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Nf and N c are the total number of field points and cluster points one expects to see inside fix; 
they can be computed by integrating rif and n c 



Nf = / rtf(x)dx 



N c {9) = / n c (x;0)dx. (14) 

Because we retain complete freedom to locate the bins wherever we like, we can center all the 
filled bins on the points Xj, in which case the third term in equation (12) becomes 

Nx N x 
lnn model J2lnn modc i(xi;0) = ln Kf( x i) + n c (^u 0)] , (15) 

filled i=l i=l 

and the sum is now carried out over all the points instead of all the filled bins. Combining these 
results we can now write the likelihood in terms that are readily computable from the model and 
the database 

AincW = -iV/-iV c + ^ln[n / (x i )+n c (x i ;0)] . (16) 

i 

Subtitutuing S = n c /rif and dropping terms that are independent of 9 gives: 

AneW = "iVc + E Ml + *(Xi5 0)] . (17) 



2.3 Application 

Both likelihood functions are applied to the data in a similar manner. A set of N@ st test locations 
are chosen from the cluster domain f2@. The likelihood functions are then evaluated at each test 
location to produce a likelihood map. The clusters correspond to the peaks in this map that 
are above a specified threshold. In full generality, producing the likelihood map would require a 
0(NxNQ St ) function evaluations. For a specific dataset, the model functions n/(x) and n c (x; 9) are 
constructed using prior empirical and theoretical knowledge of the data (see [ [Kepner et al 1999 1 for 



the specific functions). From these models additional symmetries emerge which can be exploited 
to significantly reduce the computations. For example, galaxy clusters have a finite angular size so 
at each test location only the small sub-set of data points which are near the test location need to 
be considered. 

Another simplification comes from the fact that clusters have a shape that is roughly indepen- 
dent of the total number of galaxies in the cluster 

n c (x; 9) -» An c (x; 0) (18) 

where 9 = (9, A), and A parameterizes the size of the cluster. This simple modification allows the 
coarse likelihood function to be re-written as 

£coarse(0) = 2A V 5(x,; 9) - A 2 / 5(x; fl> c (x; 9)dx , (19) 
which can now be solved for A by setting dC/dA = 

AcoarseW = ^ 5 ^ 9 } (20) 
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Inserting this value back into the previous equation gives 



^-coarse (ft) — A coarse (6)J2^;9) . (21) 

i 

The result of the above simplification is the elimination of one of the search dimensions, which 
results in a sizeable computational savings. 

The same simplification when applied to the fine likelihood function gives 

Ane = "Afine^c + £ M 1 + A^fo 0)] (22) 

i 

where Afi ne is computed by solving 

N c = Y ^ =- . (23) 

While A 

coarse can be obtained directly from jCcoarsej 

Afi ne can only be found by numerically finding 
the zero point of the above equation. Furthermore, this equation does not lend itself to standard 
derivative based solvers (e.g., Newton-Raphson) that produce accurate solutions in only a few 
iterations. Fortunately, the solution can usually be bracketed in the range < Afi ne < 1000, thus 
obtaining a solution with an accuracy AA ~ 1 takes log 2 (1000/l) = 10 iterations using a bisection 
method. 

Both the coarse and fine likelihood functions are able to exploit the specifics of the model to 
significantly reduce the number of test locations that need to be evaluated. The coarse likelihood 
function requires about 10 times less work to evaluate than the fine likelihood. Unfortunately, the 
underlying assumptions used in the derivation of the coarse likelihood function are not as accurate 
as those used to derive the fine likelihood. Thus, while the coarse likelihood is faster, the fine 
likelihood is more accurate (see Figure ^). The AMF addresses this issue by using both likelihood 
functions in a two stage approach. First, the coarse likelihood function is applied and then the fine 
likelihood function is used on the peaks found in the coarse map. 

Using both filters sequentially not only produces the best estimate of the cluster locations, it has 
the added benefit of providing two quasi-independent sets of values for each cluster. This provides 
a helpful consistency check because the coarse and fine filter react differently at the detection 
limit. The coarse likelihood tends to assign weak detections to small nearby clusters, while the fine 
likelihood makes these detections large, far away clusters. Thus, if both likelihoods peak at similar 
size and distance estimates, then the detections are probably real, but if the two likelihoods peak 
at dramatically different values than the cluster is probably a false detection. 



3 Implementation 

The likelihood functions derived in the previous section represent the core of the AMF cluster 
detection scheme. Both likelihood functions begin with picking a grid of test locations. The most 
straightforward method is to use a regularly spaced grid over Qq. Recall that each point in 
consists of an angular position, a distance and a cluster size and that each point in X consists of 
an angular position and a brightness. As shown in the previous section, the size can be determined 
without searching, and a regular grid in distance will work reasonably well provided the steps are 
sufficiently small (see Figure ||). A regular grid in angle has the difficulty of making the grid too 
big in dense regions and too fine in sparse regions (i.e., it is unnecessary to search for clusters where 
there is no data). A more optimal set of test locations is to use the angular positions of the data 
Xj, which "naturally" provides an adpative resolution. 
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3.1 Peak Finding 



Finding peaks in a 3D regularly gridded map is straightforward. Finding the peaks in the irregularly 
gridded map is more difficult. There are several possible approaches. We present a simple method 
which is sufficient for selecting individual clusters. More sophisticated methods will be necessary 
in order to find small clusters that are close to large clusters. 

As a first step we eliminate all low likelihood points hoarse < £cut, where £ cut is the nominal 
detection limit, which is independent of richness or redshift. C cn t can be estimated from the 
distribution of the £ coarse values. Step two consists of finding the largest value of £ coarse , which is 
by definition the first and largest cluster 9\. The third step is to eliminate all test points that are 
within a certain radius of the cluster. Repeating steps two and three until there are no points left 
results in a complete cluster list O. A different scheme would be to connect the irregularly gridded 
points in a Voronoi tessellation [ Ramclla 199S[ | from which local maxima could be obtained in the 
same manner as on a grid. 



3.2 Software Architecture 

Implementation of the AMF cluster selection consists of four steps: (1) reading the database and 
the model parameter files, (2) computing £ C oarse over the entire database, (3) finding clusters 
by identifying peaks in the £ C oarse map, and (4) evaluating £fi ne and obtaining a more precise 
determination of each cluster's size and distance. 

The architecture of this data processing pipeline is shown in Figure ||. The software has been 
designed so that it can accept both real and simulated data. One of the challenges of the AMF 
is organizing the software so that it can readily accept new datasets and different parameter files. 
Critical to adapting to new data is the ability see into the system and observe each step as it takes 
place. To address these issues the vast majority of the code has been implemented in an interpreted 
language (IDL from Research Systems, Inc.) which provides many mechanisms for reading in files 
and for monitoring and visualizing output. 

The computational driver of the application is the evaluation of £ C oarse- This function consists 
of a set of nested for loops which do not lend themselves to the vector notation required to get 
good performance in an interpreted language. Thus, while the interpreted code is used to set 
up the calculation, a compiled C routine is called to compute the coarse likelihood function (see 
Figure |5|). In addition to giving the superior compute performance of a compiled language, this 
layered software approach also provides a mechanism for exploiting parallel computing. 



3.3 Parallelization Scheme 

Computing the coarse likelihood map is a highly parallelizable operation. Each test point can be 
computed independently of the others if all the necessary data is available. There are a variety of 
ways to take advantage of this scheme. The one chosen here is a client /server approach based on the 
The Next generation Taskbag (TNT) software library [Kepner et al 2000]. TNT is a client-server 



based Applications Programming Interface (API) for distributing and managing multiple tasks on 
a Network-Of- Workstations (NOW). TNT is a C based library which can be used in any compiled 
program. As such, it is possible to insert the appropriate TNT calls into the compiled layer called 
by an interpreted language (see Figure |6|) . 

The operation of a typical TNT application is shown in Figure The server creates a "Taskbag" 
of work for clients. The clients are then executed remotely on a number of processors. The clients 
connect with the server and request a task or taskbag (a group of tasks). When they have completed 
their tasks they return the results back to the server and ask for more tasks. 
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The TNT library was developed on Linux (RedHat 5.0) and tested on FreeBSD, NetBSD, and 
Solaris. The entire library is written in C using TCP sockets. A server communicates with the 
clients using ports, which allows simultaneous servers to be active and listening to different port 
numbers. A server can call client functions and a client can call server functions. This enables 
the creation of hierarchies of servers. For example, a "root" server can partition a large taskbag 
into many sub-taskbags and distribute them to a collection of sub-servers. These sub-servers will 
then distribute tasks to the clients. This allows for a more efficient distribution of work across the 
cluster nodes. 

For the AMF application, the interpreted language calls a C routine which then sets up a server 
with tasks to be executed (Figure [?]). In this case, each task is a sub-set of all the test points. Clients 
are then started on other computers (or the same machine if it is multiple processor system). At 
startup each client receives the database (or a portion of the database) from the server. After 
receiving the data, each client then asks the server for a task to execute and returns the result. 

It is not possible to predict in advance how long a given task is going to take because of the 
non-linearity of the algorithm and because of heterogeneous capabilities and loads that may exist 
on the NOW. Fortunately, TNT is inherently load balancing in the sense that when a client finishes 
a task it requests additional work. If there are no tasks remaining then the client exits and frees 
up the processor. The processors that run faster will pick up more work and slower processors will 
pick up less work. 



4 Results 

The AMF has been extensively tested on simulated data to verify its accuracy and robustness 
[Kepner et al 1999]. The AMF is currently being applied to detect clusters of galaxies from the 
Sloan Digital Sky Survey (SDSS) | ]Kim et al 2000 1. The recent parallel implementation of the AMF 
has significantly increased speed of the application. 



4.1 Tests on simulated data 

In real data, neither the distances of the galaxies nor the position and sizes of the clusters are 
known. Tests on simulated data are the only opportunity to check the detection algorithm in a well 
understood environment. The test data consists of 72 simulated clusters with different sizes and 
distances placed in a simulated field of randomly distributed galaxies. The data was constructed 
to be consistent with what is expected from SDSS. The clusters range in size and distance so as to 
span the full range of expected clusters. The test data covers an area of 10 square degrees (1/1000 
of the SDSS) and contains approximately 100,000 points. 

To facilitate the subsequent analysis and interpretation of the results, the clusters were placed 
on an 8 by 9 grid. The cluster centers were separated by 0.4 degrees. The distribution of all the 
galaxies in angle is shown in Figure || where each column of clusters are the same size while each 
row of clusters are at the same distance. From left to right the sizes are A = 10, 20, 30, 40, 50, 
100, 200, and 300. From bottom to top the distances are z = 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 
0.45, and 0.5. 

The simulated data were run through the AMF and all clusters above the designated 5-a noise 
limited threshold were detected (with no false detections) . The angular positions of all the detected 
clusters were well within the expected range. The estimated distance and size of each cluster is 
shown in Figure ^. As expected, the large and/or nearby clusters are detected and measured more 
accurately than the small, far away clusters. These results indicate that the AMF can detect and 
unbiasly estimate the location of clusters. 
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4.2 Tests on SDSS data 



The Sloan Digital Sky Survey is a multi-decade, multi-institution effort to take a million by million 
pixel composite image of the night sky in five color bands. Analysis of the image data is expected 
to yield a database of 200,000,000 galaxies pzalay et al 199S ], Test data has been taken since 



1998 and been used to test all aspects of the SDSS software including the AMF (see Figure 10) 
The AMF has detected all previously known clusters in the SDSS test data it has looked at. In 
addition, the AMF performs at a level equal to or better than the other algorithms with which 
it has been compared. Detailed results and comparison of various cluster finding algorithms are 
presented elsewhere | Kim et al 2000(| . 



Adapting the AMF to the SDSS was a sizeable effort. Without the layered software approach it 
would have taken considerably longer as a considerable amount of tuning was required to properly 
set all the model parameters. 

4.3 Scalability results 

A parallel implementation of the AMF is crucial for its application to the full SDSS database. 
Currently, the AMF requires 1-2 hours of CPU time (450 MHz Pentium II) to process a 10 square 
degree field. On a parallel NOW this can easily be sped up by a factor of 100, which will make 
processing the entire 10,000 square degree SDSS dataset feasible. 

The results of running the parallel implementation on a NOW are shown in Table 1. These data 
show that the algorithm experiences good speedup on both heterogeneous and homogeneous NOWs. 
The primary bottlenecks to perfect speedups are the initial sending of data and the granularity of 



the tasks. The impact of these can be seen in the execution schedules shown in Figures 11 -|13|. 

The total computation time consists of the time to do the computation plus the slack time due 
to the granularity of the tasks 

comp iVcPU + AW ' { ] 

where iVcpu is the number of processors used in the NOW and -/Vtask is the number of tasks the 
job was broken into. The total time spent communicating is the time spent gathering the results 
plus the time to send the initial data to each processor 

T comm oc — ±- + NxNcpv ■ (25) 

JVCPU 

To achieve good scalability requires that the computation-to-communication ratio stay high as 
A^cpu increases. In both cases the first term scales well while the second term doesn't. The second 
(granularity) term in the computation time is due to the fact that some processors will finish first 
and there will be no additional tasks for them to complete. This can be alleviated by simply 
dividing the work up into sufficiently small tasks until this time no longer becomes important. 

The second (startup) term in the communication time can be dealt with in several ways. First, 
the algorithm can be restructured so that each processor gets less data at startup. Second, the 
communication pattern can be remapped so that the initial data is distributed in multiple steps 
along a tree. Finally, since the initial data is the same for each processor, it should be possible 
to use a multicast to allow the data to be distributed everywhere in a single broadcast. Without 
alleviating this bottleneck the speedup is limited to approximately 100 on a 100 MBit/s class 
network. If the initial transmit bottleneck can be overcome, it should be possible to see speedups 
in the 5000 range. 
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5 Summary and Conclusions 



We have presented the Adaptive Matched Filter method for the automatic selection of clusters of 
galaxies from a galaxy database. The AMF is adaptive in two ways. First, the AMF uses a two 
step approach that first applies a coarse filter to find the clusters and then a fine filter to provide 
more precise estimates of the distance and size of each cluster. Second, the AMF uses the location 
of the data points as a "naturally" adaptive grid to ensure sufficient spatial resolution. 

Matched Filter techniques have a firm mathematical basis in statistical signal processing. The 
AMF uses a hierarchy of two filters (each mathematically correct under its assumptions). Combin- 
ing these filters allow the AMF to maximize computational performance and accuracy. The AMF 
also provides two estimates for each cluster which can be compared as an additional check. This is 
particular effective for these filters because they react differently when given insufficient data. 

The AMF relies heavily on models for both the cluster and the background field. This prior 
information is quite extensive and makes the AMF complex to implement and difficult to adapt to 
new data sets. To alleviate this coding challenge a hybrid coding approach was used to leverage the 
ease of use of interpreted languages along with the compute performance of compiled languages. 
In this way the complex task of testing model inputs and observing their effect through the data 
processing pipeline can be done quickly without sacrificing the compute efficiency necessary to 
complete the application in a timely manner. 

A further benefit of the hybrid approach is that it makes available to the compiled code a wide 
variety of parallel software libraries and tools. A parallel implementation is critical to the applica- 
tion because matched filter techniques work by testing every possible location in the cluster space 
for the presence of a cluster. This is a compute intensive operation, but also provides a high degree 
of parallelism. The parallelization scheme used for the AMF application is a client /server approach 
which is a very effective on Network-Of- Workstations. The TNT client/server software used is 
lightweight and efficient, and provides a naturally load balancing and fault tolerant framework. 

The AMF has been extensively tested on simulated data. These results indicate that it robustly 
and accurately detects clusters and estimates their positions while having few false positives. The 
AMF is now being applied to the first results of the Sloan Digital Sky Survey | Kim et al 2000 1. 
These tests have shown that the AMF detects all previously known clusters in this data and 
performs at or above other cluster finding methods. The AMF hybrid application architecture has 
proven effective in supporting the implementation of new datasets. The TNT based client/server 
parallelization scheme has also demonstrated significant speedups which will make it feasible for 
this application to address to the entire SDSS when it becomes available. 
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iVcpu (eff) 


^"task 


Total Time 


Speedup 


Efficiency 
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1.0 


1 


5451 






13 


10.5 


50 


700 


7.8 


74% 


13 


10.5 


200 


650 


8.4 


80% 


32 


32.0 


200 


200 


27.3 


85% 



Table 1: AMF Execution Times. Execution times in seconds for various numbers of processors 
and various numbers of tasks. iVcpu is the number of processors used. iVcpu (eff) is the number of 
processors weighted by their clock speed. iV tas k is how many sub-tasks the problem was broken into. 
The increased parallel efficiency between rows two and three is due to the use of more tasks which 
results in less slack time due to the task granularity. The increased parallel efficiency between rows 
three and four is due to the use of a NOW with a higher performing interconnect which reduces 
the time it takes to initially transmit the data. 
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Figure 1: Cluster Image. The cluster 0024+1654 as seen with the Hubble Space Telescope 
flColley et al 1996 1. The reddish circular patches in the center are Spheroidal galaxies each contain- 



ing 10 stars. The flatter reddish patches are Spiral galaxies like our own Milky Way. The blue 
arcs around the edge are from a distant background galaxy that has been gravitationally "lensed" 
by the cluster. This cluster has around one thousand members and lies at a distance of 4.5 billion 
light years. [Note: our nearest neighbor the Andromeda galaxy is 1.5 million light years away.] 
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Figure 2: Filament Simulation. Isosurface view of a 3D simulation (courtesy of Michael 
Norman) of the Universe showing the sheets and filaments of matter along which galaxies form. 
Clusters of galaxies tend to form where two filaments cross. The simulation volume is approximately 
500 light years on a side. 
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Figure 3: Coarse and Fine Filters. Plots of the of likelihood and cluster size (A) as a function 
of distance as computed from the coarse and fine matched filters. The input cluster has a true 
distance of 0.35 (4.5 billion light years) and size of A = 100. The coarse likelihood peaks at a 
distance of 0.31 and estimates the size to be A = 102. The fine likelihood peaks at a distance of 
0.35 and a estimates the size to be A = 98. In general, the fine likelihood provides better distance 
and size estimates but at approximately 10 times the computational cost. 
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Figure 4: Software Pipeline. Schematic of the AMF software processing pipeline. The process- 
ing can accept either real or simulated data. The data is broken up into survey areas and fed into 
the AMF. The AMF does preliminary data checking followed by the coarse filter. The results of 
the coarse filter are used to select the locations of clusters which are then fed into the fine filter 
to provide more accurate estimates. The coarse filter is the dominant step in terms of computing 
cost. 
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Figure 5: Single CPU Application Architecture. Application architecture before implemen- 
tation on an Network-Of- Workstations. GUI and other "high level" operations are written in the 
interpreted layer, which calls the compute kernel (the coarse filter in the AMF application) written 
in a compiled language. 
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Figure 6: NOW Application Architecture. Application architecture after implementation on 
an NOW. An additional "TNT" layer has been added to the compute kernel which invokes and 
manages multiple copies of the compute kernel (coarse filter) on a NOW. 
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Figure 7: TNT Application A typical TNT application consists of a server with many clients, 
communicating via TCP/IP. The server: places tasks into Taskbag; listens on a specific port for 
requests for tasks from clients; dispatches tasks to requesting clients; accepts results from clients; 
monitors status of clients and re-assigns tasks of dropped clients; when all tasks are completed, 
returns results back to the main program. The client (s) loop over the Taskbag is until it is empty. 
On each iteration a client will: send requests for work to server on a specific port; read data sent 
by server over network; call compute kernel with the data; send results of computation back to 
server over network. For the AMF, the tasks correspond to sub-sets of the test locations and the 
compute kernel is the coarse likelihood evaluation. 
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Figure 8: Simulated Data. Angular positions of simulated data containing a uniform field and 
72 clusters arranged in an 8 by 9 grid. Each column of clusters are the same size while each row of 
clusters are at the same distance. From left to right the sizes are A = 10, 20, 30, 40, 50, 100, 200, 
and 300. From bottom to top the distances are z = 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, and 0.5. 



20 



0.60 



0.50 



0.40 



Q 0.30 



0.20 



0.10 



0.00 



□ □ □ 




Q HQ, 



□ 



□ 



□ 



□ 



□ 



10 



100 



H— 



■H ja 



□ 



□ □ 



Cluster Size 



Figure 9: Results from Simulated Data. The size and distance of each of the input clusters 
(boxes) with the short lines indicating the corresponding values determined from the AMF fine 
filter. The long curved line indicates the approximate detection limit. All the clusters above the 
detection limit (i.e. to the right of the line) are found with no false detections. As expected the 
size and distance estimates are best for the largest and nearest clusters, while clusters near the 
detection limit have poorer estimates. 
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Figure 10: Likelihood Map of Sloan Data. A strip of the Sloan Digital Sky Survey data 
(courtesy of SDSS Collaboration) was processed through the AMF. This strip covers 30 square 
degrees and contains nearly 200,000 galaxies. The image shows the projected coarse likelihood map 
of this strip (red are low values, blue higher, and green is highest). The the dotted circles denote 
cluster detections. 
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Figure 11: Heterogeneous NOW (50 Tasks). Task schedule for the coarse filter running on 
a heterogeneous NOW consisting of seven single processor and two quad processor workstations. 
The number in each box shows how many tasks each processor has completed. Task is the initial 
transmittal of the data. This calculation achieves a speedup of 7.8 out of 10.5 (74%). The barriers 
to full speedup are the initial transmission and the slack time at the end due the task granularity. 
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Figure 12: Heterogeneous NOW (200 Tasks). Same as Figure 11 except that the work has 
been broken up into 200 tasks which reduces the slack time at the end of the calculation resulting 
in a speedup of 8.4 out of 10.5 
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Figure 13: Homogeneous NOW (200 Tasks). Same as Figure ^ but run on sixteen identical 
dual processor workstations with a higher performance network. The better network reduces the 
initial transmit time (Task 0) and results in a speedup of 27.3 out of 32 (85%). 



24 



Biographies 



Jeremy Kepner received his B.A. in Astrophysics from Pomona College (Claremont, CA). He 
obtained his Ph.D. focused on Computational Science from the Dept. of Astrophysics at Princeton 
University in 1998, after which he joined MIT Lincoln Lab. His research has addressed the devel- 
opment of parallel algorithms and tools and the application of massively parallel computing to a 
variety of data intensive problems. E-mail:jvkepner@astro. princeton.edu or kepner@ll.mit.edu 

Rita Seung Jung Kim Rita S.J. Kim is completing her Ph.D. at Princeton University in Astro- 
physical Sciences. She received her B.S. in Astronomy at Seoul National University (Seoul, Korea), 
and spent one year in Paris at the Institut d'Astrophysique de Paris. Her thesis work concentrates 
on the properties of cluster galaxies, and has also worked on statistical modeling of the large scale 
structure of the Universe. E-mail:rita@astro. princeton.edu 



25 



